Provably Efficient Reinforcement Learning with Linear Function Approximation

نویسندگان

چکیده

Modern reinforcement learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed approximate either the value or policy. The introduction raises a fundamental set challenges involving computational and statistical efficiency, especially given need manage exploration/exploitation trade-off. As result, core RL question remains open: how can we design provably efficient algorithms that incorporate approximation? This persists even in basic setting linear dynamics rewards, for which only needed. paper presents first provable algorithm both polynomial run time sample complexity this setting, without requiring “simulator” additional assumptions. Concretely, prove optimistic modification least-squares iteration—a classical frequently studied setting—achieves [Formula: see text] regret, d ambient dimension feature space, H length each episode, T total steps. Importantly, such regret independent states actions. Funding: work was supported by Defense Advanced Research Projects Agency program on Lifelong Learning Machines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient ...

متن کامل

Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value funct...

متن کامل

Reinforcement Learning with Linear Function Approximation and LQ control Converges

Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem)...

متن کامل

Residual Algorithms: Reinforcement Learning with Function Approximation

A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basisfunction system, a memory-based learning syst...

متن کامل

Sample-Efficient Evolutionary Function Approximation for Reinforcement Learning

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent’s optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics of Operations Research

سال: 2023

ISSN: ['0364-765X', '1526-5471']

DOI: https://doi.org/10.1287/moor.2022.1309